Improving stochastic estimates with inference methods: 
calculating matrix diagonals 



Marco Selig, Niels Oppermann, and Torsten A. Enfilin 
Max- Planck- Institut fur Astrophysik, Karl-Schwarzs child- Strafle 1, 85741 Garching, Germany 
(Received 2011 August 02; Accepted 2012 February 23) 

Estimating the diagonal entries of a matrix, that is not directly accessible but only available as 
a linear operator in the form of a computer routine, is a common necessity in many computational 
applications, especially in image reconstruction and statistical inference. Here, methods of statistical 
inference are used to improve the accuracy or the computational costs of matrix probing methods to 
estimate matrix diagonals. In particular, the generalized Wiener filter methodology, as developed 
within information field theory, is shown to significantly improve estimates based on only a few 
sampling probes, in cases in which some form of continuity of the solution can be assumed. The 
strength, length scale, and precise functional form of the exploited autocorrelation function of the 
matrix diagonal is determined from the probes themselves. The developed algorithm is successfully 
applied to mock and real world problems. These performance tests show that, in situations where a 
matrix diagonal has to be calculated from only a small number of computationally expensive probes, 
a speedup by a factor of 2 to 10 is possible with the proposed method. 
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I. INTRODUCTION 

Imagine the inside of a black box, which computes an 
output in a concealed way from a given input, needs to 
be investigated. This is a frequent computational task in 
image reconstruction problems and it might also be nec- 
essary in reverse engineering the functionality of a given, 
already compiled computer code. In this work, the black 
box acts as a linear operation on its high-dimensional 
input, which can be freely chosen in order to probe the 
box. 

However, an internal property of the box that is of 
interest might be obfuscated and therefore needs to be 
inferred indirectly by analyzing suitable probes. In prac- 
tice, the obfuscation could be just due to the complexity 
of the box-internal operations, as it happens in image 
reconstruction problems. These operations could be any 
combination of many possible linear operators, like scalar 
or componentwise multiplications, vector space rotations, 
Fourier transformations, linear equation solvers, etc. The 
output, however, is assumed to be fully available. 

Mathematically, the linear operation performed by the 
black box can be represented by a matrix in case the in- 
put and output spaces are of finite dimension. In case 
these spaces have the same dimension, the matrix is a 
square matrix. In this paper, we are interested in the 
diagonal of a matrix for which input and output space 
can be identified with each other, e.g., in image recon- 
struction problems by component- or pixelwise corre- 
spondence. There, the matrix diagonal expresses how 
much a component of the input vector imprints onto the 
corresponding component of the output vector. This ma- 
trix diagonal, or any other component of the matrix rep- 
resenting the black box operator, can be probed by feed- 
ing a number of test vectors into the box and analyzing 



the results. 

The simplest, but most accurate, scheme is to probe 
any of the diagonal components sequentially, one by one. 
This can be done by choosing an input vector that con- 
tains only one nonzero component at the location of a 
certain pixel, and repeating this for all pixels. The ma- 
trix entries can then be read off from the output vectors. 
This scheme, however, is computationally very expensive 
if the dimension of the involved vector spaces is very high. 
In image reconstruction, one deals with the space of all 
possible images, which has the dimension of the num- 
ber of image pixels. Probing these sequentially is nearly 
always computationally prohibitive. 

A cheaper, but less accurate, scheme is to feed white 
noise vectors into the black box and to correlate the out- 
puts to the inputs component- or pixelwise. Averaging 
these correlations over a sufficiently large set of input vec- 
tors yields an estimator for the desired matrix diagonal. 

A first proposal for such a probing scheme can be 
found in the work by Hutchinson [TJ and references 
therein]. There, the functionality and efficiency of 
probing for obtaining trace estimates has been proven. 
Bekas, Kokiopoulou and Saad [5] extended the probing 
to Hadamard vectors (rows of the Hadamard matrix) to 
improve the estimation of diagonals of banded matrices. 
Those methods have been oriented to applications in den- 
sity functional theory. Similar problems were approached 
by Tang and Saad [21 and references therein] with a fo- 
cus on nonstochastic estimators. The recent paper by 
Aune and Simpson [4] transfers the probing technique to 
the field of information theory, in particular to the cal- 
culation of log-likelihoods. Finally, in the extensive work 
of Rohde and Tsybakov [5], the noise corrupted obser- 
vation of unknown matrix entries is investigated from a 
more mathematical point of view. 
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In these schemes, the number of required probes is 
much smaller than in the simple sequential probing 
scheme. However, this comes at the price of a lower ac- 
curacy of the result due to the stochastic nature of these 
schemes, which leads to sample variance or uncertainty 
in the result. The situation is completely analogous to 
a noise contaminated signal measurement. Higher accu- 
racy can be achieved by calculating or measuring more 
samples, since the noise will average out in the long run. 
This, however, might become computationally too expen- 
sive at some point. 

Noise suppression is a common theme also in the dis- 
cipline of image and signal reconstruction. Some a priori 
knowledge on signal properties may exist, like the pres- 
ence of spatial smoothness, or the knowledge of limited 
variance of certain image features (pixels, regions, Fourier 
modes, etc.). This knowledge can be exploited in order 
to reduce the impact of the noise on the reconstructed 
image beyond the simple noise averaging. 

Here, we propose to use well-suited image reconstruc- 
tion techniques, like generalizations of the Wiener filter, 
for an improved reconstruction of the matrix diagonals. 
We show that when determining the diagonals of covari- 
ance matrices, as needed in problems of image reconstruc- 
tion, the usage of suitable filters can improve the accu- 
racy of the inferred matrix diagonal significantly. The 
method proposed here yields more accurate results for a 
given amount of CPU time or, respectively, can reach a 
given level of accuracy earlier. In the examined applica- 
tions, we found a performance increase by a factor of 2 
to 10. This definitely enlarges the number of signal re- 
construction problems which can be tackled, and might 
enable computations which were marginally prohibitive 
before. 

The remainder of this paper is structured as follows. 
In Sec. [nj we highlight the importance of obtaining esti- 
mates for matrix diagonals, in particular of uncertainty 
covariance matrices in the context of information field 
theory (IFT) for signal reconstruction. The stochastic 
probing approach is reviewed in Sec. |III| In Sec. |IV| 
we propose using filters derived within the framework of 
IFT to solve the problem of estimating matrix diagonals. 
Both methods are verified by simple mock examples as 
well as by a real-world problem in Sec. |Vj We conclude 
in Sec. ED 



II. PROBLEM OF MATRIX DIAGONALS 

Linear operators are fundamental in any area of com- 
putation and thereby often expressed in their matrix rep- 
resentation. 

In the field of information theory, the covariance ma- 
trix of a quantity (which equals the inverse of the pre- 
cision matrix) holds a key role. To stress this, let us 



consider a multidimensional zero-mean Gaussian, 

with the covariance matrix X = ((p(p J )g, where ip is a 
random field defined over some pixelized vector space, 
j denotes a transpose, and ( • )g ^ s the expectation value 
weighted by this Gaussian. (In this context "pixel" 
is to be understood as a discretized coordinate which 
elsewhere may be referred to as "grid point", "bin" or 
"voxel".) 

A diagonal entry of the covariance matrix is the 
squared standard deviation Ui assigned to pixel i and 
expresses the pixelwise uncertainty in ip, 

a\ = ($) g = X tl . (2) 

A sophisticated and effective reconstruction tool is the 
generic filter [5] that we will review in the following. 
We provide this review in order to further emphasize 
the importance and problem of obtaining matrix diag- 
onals for stochastic inference. Furthermore, this filter 
forms also the basis of our proposed algorithm discussed 
in Sec. HVBl 

A. Generic filter 

Signal inference focuses on the reconstruction of some 
signal s in order to explain a set of measurements d, both 
of which are connected by a forward data model, 

d = R[s] + n, (3) 

where n is the noise and R a (not necessarily linear) re- 
sponse operator that maps from signal to data space. 

The generalized Wiener filter, as derived, e.g., in 
Ref. [7] in a Bayesian framework, is for one thing based 
on a linear forward data model, 

d = Rs + n, (4) 

where the data d is a sum of the signal response R s and 
the noise n. In this scenario, the response is a linear oper- 
ator that inherits all aspects of the signal measurement. 
The generalized Wiener filter arises in case one can as- 
sume a Gaussian distribution for the signal's prior and 
the signal- independent noise in addition to the described 
forward data model, 

P(s\S)=g(s,S), (5) 
P(n\N)=G(n,N), (6) 

where S and N stand for the signal and noise covariance 
matrix, respectively. The choice of zero-mean Gaussians 
shall solely simplify the notation at this point and does 
not present a restriction for the theory. In fact, choosing 
an appropriate nonzero mean is often reasonable, as we 
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demonstrate in Sec. IV B In consequence of the forward 



model, the likelihood of the data given the signal and 
its response is modeled by the signal-independent noise 
distribution, 1 

P(d\s, R, N) = P(n = d - R s\s, R, N) 

= g(d-Rs,N). (7) 

The actual inverse problem of estimating the signal given 
the data leads to a posterior that, logically, is also a Gaus- 
sian, 



P(s\d) = g(s-m,D), 



(8) 



with the mean m and the covariance D that encodes 
the a posteriori signal uncertainty. The resulting filter 
formula, whose straightforward derivation is detailed in 
Refs. [5H5], reads 



(S' 1 + R J N~ 1 R) 1 (RJN^d) , 



(9) 



D 



where the map m is the Bayesian estimator for the sig- 
nal, i.e. its posterior mean, D is referred to as information 
propagator and j as information source in IFT [7j in close 
analogy to quantum field theory. Both the signal covari- 
ance S and the noise covariance N needed for this filter 
are here assumed to be known. A convenient description 
of these covariances is in terms of the power spectra, the 
spectra of the eigenvalues of these matrices, as well as the 
eigenbases in which the covariance matrices are diagonal. 

In the following, we decompose the signal covariance 
as 5* = *^2[CiSi, where the Si are projection operators 
that project onto eigenspaces, the spectral bands (which 
correspond to the spherical harmonics of equal degree I 
in our examples in Sec.fVj). An analogous decomposition 
exists for the inverse S = ^2iCf 1 Sf 1 , where Sf 1 is 
the pseudoinverse of Si. 

The signal's power spectrum might be unknown a pri- 
ori, whereas the eigenbasis can often be guessed from 
statistical symmetries (e.g., the spherical harmonics ba- 
sis in case of a statistically isotropic distribution on the 
sphere). Thus the spectral coefficients Ci allow for a 
parametrization of the covariance matrix. In such ap- 
plications without spectral knowledge, the generalized 
Wiener filter can be extended to a generic filter derived 
in Ref. [BJ. For this purpose, first, a logarithmically flat 
prior is assumed for the unknown spectral coefficients. 
Second, the C;-marginalized posterior for the signal is 
calculated. This posterior then allows for the reidentifica- 
tion of appropriate terms with the spectral coefficients. 2 



The resulting generic filter formulas are Eq. ^ comple- 
mented by a reconstruction rule for the power spectrum, 
i.e., for each spectral coefficient one calculates 



Ci = 



1 



Ql + 2e; 



tr Ur 



■8iD)S; 



(10) 



where gi = tr [S^ -1 ^] are the numbers of degrees of free- 
dom for each spectral band. The parameters (5i, e;) char- 
acterize the different filter options: two specific forms are 
the classical filter, for which one chooses ($;,£/) = (0,0), 
and the critical filter, for which (5i,ei) = (1,0). The for- 
mer can be derived from a "classical" maximum a poste- 
riori approximation of the spectral uncertainty marginal- 
ized problem. The latter is called "critical" because it ex- 
hibits (in contrast to the classical filter) only a marginal 
perception threshold. For a filter with a perception 
threshold, the signal-to-noise ratio of a spectral mode 
has to exceed a certain threshold in the data before the 
filter recognizes it at all. There exists a critical line in the 
(5-e-plane separating filters that fully suppress bands with 
insufficient spectral power from filters that do not. The 
critical filter resides exactly on this line, while the classi- 
cal filter is in the region with such a perception threshold. 

All in all, Eqs. ^ and ( [l0| provide an iterative 
scheme for the full inverse problem of signal reconstruc- 
tion with unknown power spectrum, i.e., unknown corre- 
lation structure. The signal reconstruction benefits from 
the additional spectral information recovered from the 
data since it encodes internal structure of the signal. If, 
in addition, one lacks the a priori knowledge about the 
correlation structure of the noise, an analogous approach 
is conceivable; cf. [5]. 

The result of the inference is, first, a map m describ- 
ing the posterior mean field of the signal in order to ex- 
plain the given data d and, second, a covariance matrix 
D that encodes the underlying uncertainty correlation 
structure. The matrix diagonal of D is thereby of spe- 
cial importance, since it expresses the pixelwise variance, 
cf. Eq. ([2]), allowing one to make a statement about the 
uncertainty of the result in each pixel. 

In order to apply the critical or other generic filters 



we may need to calculate the trace of D S, in Eq. ( 10 ) 



in each iteration, and we have to evaluate the diagonal 
of D in order to interpret the reliability of our results. 
This motivates our ambition to develop faster and more 
accurate matrix probing schemes. 

Generic filters are applied, e.g., in Refs. [51 IMTT]. 



B. Exact matrix diagonal 



1 All noise contributions are marginalized over in an interme- 
diate step, P(d\s, R, N) = fVn P(d\s,n,R)P(n\N), where 
P(d\s, n, R) = S(d — Rs — n) according to Eq. Q. 

2 Equation ( | 1 [ ) (with 5j = 1) can alternatively be derived as the 
maximization of a signal-marginalized posterior with respect to 
(the logarithm of) the spectral coefficients. 



The diagonal of the uncertainty covariance D is a quan- 
tity of interest, but unfortunately not directly accessible 
in most cases. Its calculation involves complex matrix 
operations, such as matrix inversion; see Eq. jjjj. Of- 
ten, the complete matrix is not known explicitly, only 
the matrix-vector multiplication is available as a black 
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box in the form of a computer routine which reads in 
and returns a vector. 

Calculating the diagonal of a matrix X of dimension r 
seems still possible using the canonical basis vectors 
(with ef ] =5 lk yi, ke {!,..., r}), 



since 



diag[X] = ^e (fe) *Ae (fc) , 

k 



(11) 



where * denotes a componentwise product in the way 
that (a * b)i = atbi V i € {1, . . . , r}. 

It is obvious that this "true" diagonal is too ex- 
pensive computationally because one needs to evaluate 
the matrix-vector multiplication exactly r times looping 
through all canonical basis vectors where the dimension 
r of the problem can be very high (r» 1). In addition, 
each of those products alone can be expensive because it 
may invoke numerical inversion techniques, e.g., a con- 
jugate gradient scheme [112 , which is the case in most of 
the examples in Sec. IVl 



III. PROBING ESTIMATE 

The question arises if one can choose another set of 
vectors instead of the full set of canonical basis vectors to 
speed up the computation. Independent and identically 
distributed random variables stored in a set of vectors 
{£} (with sample size |{£}| = A) work out if they fulfill 
the property 



tee. 



A— >oo 



» 5i 



(12) 



The average 



stands for the arithmetic mean over 



a set {£} and Sij for the Kronecker delta. 

Two of many possible options are (i) equally probable 
values of ±1 for the components of £ pQ 3 or (ii) zero-mean 
Gaussian random numbers with unit variance. Both were 
originally developed for trace estimation. We use (ii) in 
the following applications. 

Regardless of the choice of the random vectors, the 
sample average 



(^XO U} ^diag[l] 



(13) 



over an infinite set results in the "true" diagonal; see the 
Appendix [S] 

The average over a finite but sufficiently large set 
(A < r < oo) gives the probing estimate / of the ma- 
trix diagonal, 



diag £>«}=/ 



(14) 



3 In Ref. 2 , a much more sophisticated choice, based on Ref. pQ, 
is presented. 



Given this estimator, a trace estimate is obtained by sum- 
ming up all elements of /, as 



tr[X]«(£T*0{e}=X> 



(15) 



Since one wants to obtain an estimator in a finite period 
of time, one has to find an acceptable trade-off between 
the sample size A and the residual error, where the latter 
scales with 1/yA according to the law of large numbers. 
Aiming for a certain precision, therefore, requires a par- 
ticular amount of computation time. 

The estimator given by Eq. ( fl4| is absolutely generic 
and applicable to a variety of matrices. Recent applica- 
tions of it can be found in Refs. [U |U [TU] . 



IV. BAYESIAN ESTIMATE 

A. Forward model 

Instead of doing a simple averaging of the probes, we 
now want to develop a Bayesian estimate which exploits 
additional knowledge of the problem to infer the matrix 
diagonal from a smaller set of samples. For this pur- 



pose, we consider the sampling described by Eq. ( 14 1 as 



a linear forward model of a measurement process for the 
signal s — diag [X] we are interested in. (In order to 
avoid confusion, already introduced synonymous quanti- 
ties that appear now in another context are marked with 
a tilde.) 

For one sample, a G {1,...,A}, the measurement 
equation takes the form 



JO) = * x f W 

= diag[(^ ) ) 2 ,...,(^)) : 



s + n 



&(«) 



(16) 



For all samples, it is 

d= (dW,...,d^y 



R s + n, 



(17) 



where d represents the "measured" data, R the signal 
response, and fi the noise. The contributions from all 
off-diagonal matrix elements are considered to be noise, 
i.e., 



*(X-diag[X u ,...,X rr ]) £ 



(a) 



(18) 



and they can be estimated using Eq. (17), once we have 
an estimator for the signal. 

Note that, if one chooses the random variables £ to be 
±1, first, one does not have to draw normal variables as 
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originally pointed out by Ref. Q] an d, second, all the re- 
sponse martices M a ^ equal 1 and hence do not need to be 
treated separately for the different samples. This speeds 
up the algorithm and reduces the memory requirements. 



B. Proposed algorithm 

Our goal is to find an estimator for the matrix diag- 
onal which is close to the minimum mean square error 
estimate, but still computationally affordable. This esti- 
mator has to account for our missing knowledge about 
the underlying correlation structure. Given these re- 
quirements, the generic filter formulas are potentially an 
appropriate choice. Therefore, our proposed algorithm is 
based on this filter. 

We start by probing the matrix as described in Sec. |III| 
and as a result obtain a first estimator / for our signal, 
i.e., the matrix diagonal. This additional information 
changes our state of knowledge about the matrix diago- 
nal in the way that the assumed prior in Eq. ^ is not 
adequate. Although / is a sufficient starting value for 
the iterative part of the scheme, it is not suitable for the 
construction of a prior, since, after only a few samples, 
several diagonal entries of the matrix may still be consid- 
erably over- or underestimated. By contrast, / provides 
already a sufficiently accurate estimator for the trace; 



Equations ( 20 1 to ( 22 ) are solved iteratively in the fol- 



see Eq. (15). For that reason, we can a priori expect 



the matrix diagonal s to be distributed around some t 
rather than around zero, where for all i € {1, . . . , r} we 
set U = Y,jf]/ r ~ ( tr PQ{f}/ dim [ X ]- Therefore, the 
prior of the matrix diagonal is chosen to have a nonzero 
mean t, 



p(s,s) = g(s-t ) §). 



(19) 



As a consequence, the filter formulas Eqs. ^ and (10) 
undergo a shift, 



m = D[ R T N~ 1 d + S~H 



RJN R 



D = IS- 1 
1 



Qi + 2e; 



tr 



((m-t) (m-i) J + 6iD^ S; 



(20) 
(21) 
(22) 



Furthermore, the noise covariance, i.e., its required in- 
verse, is unknown a priori and needs to be estimated for 
our algori thm. If we use the data model described in 
TV -1 can be approximated by the noise given 

(23) 



Sec. IV A 



the data and an estimator for the signal 

n = d — R rh. 
We simplify our calculation by using 



A" 1 = (nn T )~ 



(diag [n * h] ) 



(24) 



This is done in order to limit the computational effort, 
and it can be shown that this corresponds to the treat- 
ment of an unknown noise covariance presented in Ref. [9 
by means of a classical filter. 



lowing scheme 

(1) Start with m (l/=0) = / according to Eq. (Ml) 

(2) Compute h {v+1) according to Eq. ((23). 



(3) Compute +1) according to Eq. \22\, 
while ignoring t and D for v = 



(4) Compute m (,/+1) according to Eq. p0| 
using Eqs. plj) and p4|. 



(5) Repeat steps ([2| to Q until convergence. 

As an initial guess for the power spectrum in step ([3]) , we 
use an overestimation. This accelerates the convergence 
process as can be seen in the extreme limits: C; — > oo : 
rh ~ R- X d, whereas Ci — > + : rh ~ t, i.e., a strong 
overestimate still gives a nontrivial result for rh, whereas 
a strong underestimate gives a nearly trivial one. 

Following Sec. |II A[ we generally recommend the criti- 
cal filter, since it does not exhibit a significant perception 
threshold. Nevertheless, in the presented examples, the 
correction term trJ-DSj -1 ] contributes only marginally to 
the accuracy and therefore the classical filter, which does 
not require the calculation of this term, is applied in the 
following. 



V. VERIFICATION & APPLICATION 
A. Numerical experiments 

To verify the proposed algorithm, we perform some nu- 
merical experiments that are posed on signals living on 
the sphere. The examples in this section are represented 
by all-sky HEALPiX Q]2 E] maps with A sidc = 8, re- 
sulting in r = 768 pixels and in a maximal spectral index 
( mal = 23 of the spherical harmonics basis in which we 
a priori assume our signal covariance to be diagonal due 
to a statistical isotropy of the signal. 

The computations were performed using the free open- 
source mathematics software system Sage [15] and the 
HEALPix package. 



1. Trivial case 

At first, we consider a trivial case where the matrix 
in question is given explicitly. To ensure that this ma- 
trix is a valid covariance matrix, i.e., it is positive and 
symmetric, we constructed it to be 



X 



/A n 
-1 



V o 



-1 



\ 



- 1 X rr J 



(25) 
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Figure 1: Trivial case: (a) the exact matrix diagonal, (b) the probing estimate, and (c) the Bayesian estimate started with four 



probes, both after around 0.3 s. 



prediction 
probing estimator 
Bayesian (A=4) 
Bayesian (A=9) 




Figure 2: (Color online) L 2 -norm of the error (divided by 
the number of pixels r) as a function of CPU time for the 
trivial case: the evolution of the probing estimator (solid), its 
theoretical prediction oc XjyA (dotted) and the Bayesian esti- 
mators starting with four (dashed) and nine samples (dashed 
dotted) are shown. 



where the diagonal entries need to fulfill Xa > 2 V i E 
{1, . . . , r} for positive definiteness and are drawn with a 
simple structure on the sphere; see Fig. [I] 

The normalized L 2 -norms of the residual error 4 serve 
as an accuracy measure and are shown as a function of 
CPU time in Fig. [2] 

Although X as an operator could be implemented very 
efficiently, we use the much more expensive full matrix 
multiplication to have realistic computational costs like 
those of applying a more complex matrix. But this trivial 
case should only hold as a proof of concept; more sensible 
examples are discussed in the following. 

As one can clearly see in Fig. [2j the probing estima- 
tor improves continuously with an increasing number of 
probes and shows an overall proportionality to \j\[A as 
argued in Sec. |III| However, the Bayesian estimator given 
a set of samples converges in only a couple of iterations 
to a result with an accuracy the pure probing will first 



reach after investing a factor of a few more CPU time. 
For a fixed amount of computation time, the Bayesian 
estimator excels the probing estimator, as can be seen 
in Fig. [I] It is also evident that the Bayesian estimator 
must reach a lower limit in its progress because only a 
limited data set is provided containing finitely accurate 
information. 



2. Realistic case 



For a more realistic mock example, we consider a co- 
variance matrix D = (S^ 1 + A -1 ) -1 similar to the one 
described by Eq. ^ where the signal covariance S is 
completely defined by a power spectrum, 



Ci oc (max{l, l})~ 



(26) 



The noise covariance A is characterized by two effects, 
first, high noise in one of the twelve HEALPiX basis 
pixels representing a defect in the detector and, second, 
smoothly increased noise toward the poles imitating an 
observational effect. 5 The described noise covariance and 
the resulting propagator D are illustrated in Fig.[3j where 
one can see the conservation of the noise structure and 
the smoothing effect of the power spectrum. 

The performance of both algorithms is shown in Fig. [4] 
Our algorithm performs qualitatively in the same way as 
in the trivial case, but the overall gain in accuracy or 
time is quantitatively lower. It is also noticeable that 
the relative advantage of the proposed method decreases 
with the number of used random vectors. Consequently, 
the matrix diagonal inference method pays off best in 
cases where a rough estimate using only a few probes is 
sufficient. 



4 Meaning mathematically ||diag [X] — f\\a/T or ||diag [X] — m||2/V, 
respectively, to be exact. 



The noise variance is assigned to each pixel i according to JV. . = 
(0.005 + 8(/ii(hj - ftmax)) 2 //imax)> where h t is the HEALPix 
ring number associated to the pixel i. 
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Figure 3: Realistic case: (a) the matrix diagonal of the mock noise covariance N and (b) the "true" diagonal of the propagator 
D = {S' 1 +N- 1 )- 1 . 
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exact calculation of the diagonal via Eq. (111. We do this 
by reducing the resolution of the all-sky map with respect 
to the one presented in Ref. [10] to HEALPix parameter 
Aside = 16, leading to 3 072 pixels, and truncating the re- 
constructed power spectrum at l max = 47. Furthermore, 
we use a coarser version of the response matrix. In this 
way, a coarse-grained version of the propagator D is de- 
fined and we can calculate its diagonal exactly, as well as 
by using the probing estimate and our Bayesian exten- 
sion. 



Figure 4: (Color online) Same as Fig. [5] only for the propa- 
gator for the realistic case. 



B. Faraday sky uncertainty 

Next, we attempt to use our algorithm in a real phys- 
ical application. We consider the inference problem dis- 
cussed in Ref. [TO]. In that work, an all-sky signal - the 
galactic Faraday depth - was reconstructed from mea- 
surements in 37 543 individual directions [16]. The data 
were modeled according to a linear measurement proce- 
dure, Eq. Q , with a response R that is nonzero only for 
directions in which measurements had been made and is 
larger within the galactic plane. 

To reconstruct the field, the critical filter algorithm 
that was discussed in Sec. |II A| was used, yielding an es- 
timate for the posterior mean m of the signal field s, 
as well as an estimate for the components of the angu- 
lar power spectrum of this field, C\. In addition, a map 
showing the uncertainty of the signal estimate to, given 
by diag [D] , is provided in Ref. [TD] . This was calculated 
from the information propagator D, which takes on the 
form (21), by applying the probing estimator discussed 
inSecHInl 

Here, we show how the application of our Bayesian al- 
gorithm to this problem can improve the accuracy and 
speed up the calculation of this matrix diagonal. In or- 
der to be able to compare the results of the probing and 
Bayesian estimators to the correct matrix diagonal, we 
reduce the dimensionality of the problem to facilitate the 



Figure [5] shows the results of these calculations, where 
the matrix-vector multiplication was conducted for ten 
different random vectors in the case of the probing and 
Bayesian estimators. The sphere in this case corresponds 
to a map of the whole sky, where the disk of the Milky 
Way extends horizontally in the middle of the image. 
While still exhibiting a large amount of noise, both the 
probing and Bayesian results show roughly the right 
structure after only a few iterations. This structure is de- 
termined mainly by an oval region of high uncertainty, i.e. 
large diagonal entries, where no observations had been 
made and a dependence on galactic latitude due to the 
larger signal response within the galactic plane. 



From Fig. [5] alone, it is hard to judge whether the 
Bayesian estimator leads to an improvement over the 
probing one. We therefore plot again the L 2 -norm of the 
difference between the estimated matrix diagonal and the 
"true" one as a function of CPU time in Fig. [6] Shown 
is the curve for the pure probing estimator as well as 
two examples for Bayesian improvements, using two and 
ten random vectors, respectively. It is evident that for 
both cases the Bayesian method gives a boost in accuracy 
with only marginal time consumption. The absolute and 
relative improvement is larger if one uses fewer random 
vectors. This shows again that the main strength of the 
Bayesian method does not lie in the absolute accuracy 
that can be reached, but rather in the speedup it pro- 
vides for obtaining an estimate for the matrix diagonal 
with intermediate accuracy. 
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Figure 5: Diagonal of the propagator for the reconstruction of the galactic Faraday depth. Panel (a) shows the result of the 



exact calculation according to Eq. (Ill, panel (b) the probing result after ten iterations, and panel (c) the result of the Bayesian 
estimator, using ten random vectors as well. 
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Figure 6: (Color online) Same as Fig. [5] only for the propa- 
gator for the reconstruction of the galactic Faraday depth. 



rithm is especially effective when matrix diagonals need 
to be calculated only roughly, since the relative gain in 
accuracy is larger if only a few probes are available. 

This has been shown in numerical examples as well as 
for the uncertainty map appearing in the reconstruction 
of the galactic Faraday depth. 
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VI. CONCLUSIONS 



Appendix A: Proof of the probing estimator 



This work aims at transferring successful signal infer- 
ence methods to the realm of numerical computation. In 
particular, the problem of inferring internal properties 
of a computational black box is analyzed. To this end, 
the well known method of stochastic probing of matrix 
diagonals is augmented by a sophisticated signal recon- 
struction scheme. This inference algorithm interprets the 
probing of the matrix diagonal as a numerical measure- 
ment including unwanted uncertainties, or plainly noise. 

As in signal reconstruction, additional knowledge on 
the matrix diagonal, i.e., our signal, can be exploited 
to improve the accuracy of the obtained result. We use 
known symmetries of the underlying continuous structure 
of the matrix diagonal, namely approximate statistical 
isotropy. 

This improves the estimates acquired from probing, 
without increasing computational costs significantly. In 
order to achieve the same level of accuracy, the tra- 
ditional probing method requires computational costs, 
which are a factor of 2 to 10 times larger than the pro- 
posed scheme, in the cases investigated here. This algo- 



Hcrc we prove that Eq. ( |13[ ) is indeed implied by 
Eq. (12 1. Given a sufficiently large but not necessarily 



finite set {£} (with |{£}| = A), the condition given by 
Eq. (12) becomes 



A-^)m = A-iE^=^. (Ai) 



0=1 



Inserting this equality in Eq. (13), without loss of gener- 



ality restricted to the average's component i g {1, . . . , r}, 



3 = 1 



proves the statement. 
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